A Two-Stage Approach for Generating Unbiased Estimates of Text Complexity

نویسندگان

  • Kathleen M. Sheehan
  • Michael Flor
  • Diane Napolitano
چکیده

Many existing approaches for measuring text complexity tend to overestimate the complexity levels of informational texts while simultaneously underestimating the complexity levels of literary texts. We present a two-stage estimation technique that successfully addresses this problem. At Stage 1, each text is classified into one or another of three possible genres: informational, literary or mixed. Next, at Stage 2, a complexity score is generated for each text by applying one or another of three possible prediction models: one optimized for application to informational texts, one optimized for application to literary texts, and one optimized for application to mixed texts. Each model combines lexical, syntactic and discourse features, as appropriate, to best replicate human complexity judgments. We demonstrate that resulting text complexity predictions are both unbiased, and highly correlated with classifications provided by experienced educators.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Impact of Summary Writing with Structure Guidelines on EFL College Students’ Rhetorical Organization: Integrating Genre-Based and Process Approaches

This study aimed at investigating the impact of writing on Iranian EFL college students’ rhetorical organization. Thirty Iranian female undergraduate students majoring in English at Al-zahra University participated in the current study. The writing instructions included two stages, each lasting for four weeks. The participants were assigned to a control group and an experimental group according...

متن کامل

A New Approach Generating Robust and Stable Schedules in m-Machine Flow Shop Scheduling Problems: A Case Study

This paper considers a scheduling problem with uncertain processing times and machine breakdowns in industriall/office workplaces and solves it via a novel robust optimization method. In the traditional robust optimization, the solution robustness is maintained only for a specific set of scenarios, which may worsen the situation  for new scenarios. Thus, a two-stage predictive algorithm is prop...

متن کامل

Stage specialization for design and analysis of flotation circuits

This paper presents a new approach for flotation circuit design. Initially, it was proven numerically and analytically that in order to achieve the highest recovery in different circuit configurations, the best equipment must be placed at the beginning stage of the flotation circuits. The size of the entering particles and the types of streams including pulp and froth were considered as the bas...

متن کامل

Bi-objective Optimization for Just in Time Scheduling: Application to the Two-Stage Assembly Flow Shop Problem

This paper considers a two-stage assembly flow shop problem (TAFSP) where m machines are in the first stage and an assembly machine is in the second stage. The objective is to minimize a weighted sum of earliness and tardiness time for n available jobs. JIT seeks to identify and eliminate waste components including over production, waiting time, transportation, inventory, movement and defective...

متن کامل

Syntactic Complexity of Russian Unified State Exam Texts in English: A Study on Reliability and Validity

In this study we analyze texts used in Russian Unified State Exam on English language. Texts that formed small research corpora were retrieved from 2 resources: official USE database as a reference point, and popular website used by pupils for USE training “Neznaika” (https://neznaika.pro/). The size of two corpora is balanced: USE has 11934 tokens and “Neznaika” - 11918 tokens. We share Biber’...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013